Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora
نویسندگان
چکیده
Synthetic speech usually suffers from bad F0 contour surface. The prediction of the underlying pitch targets robustly relies on the quality of the predicted prosodic structures, i.e. the corresponding sequences of tones and breaks. In the present work, we have utilized a linguistically enriched annotated corpus to build data-driven models for predicting prosodic structures with increased accuracy. We have then used a linear regression approach for the F0 modeling. An appropriate XML annotation scheme has been introduced to encode syntax, grammar, new or already given information, phrase subject/object information, as well as rhetorical elements in the corpus, by exploiting a Natural Language Generator (NLG) system. To prove the benefits from the introduction of the enriched input meta-information, we first show that while tone and break CART predictors have high accuracy when standing alone (92.35% for breaks, 87.76% for accents and 99.03% for endtones), their application in the TtS chain degrades the Linear Regression pitch target model. On the other hand, the enriched linguistic meta-information minimizes errors of models leading to a more natural F0 surface. Both objective and subjective evaluation were adopted for the intonation contours by taking into account the propagated errors introduced by each model in the synthesis chain. key words: prosody modeling, text-to-speech, linguistic meta-information, synthetic prosody evaluation
منابع مشابه
Modeling Prosodic Structures in Linguistically Enriched Environments
A significant challenge in Text-to-Speech (TtS) synthesis is the formulation of the prosodic structures (phrase breaks, pitch accents, phrase accents and boundary tones) of utterances. The prediction of these elements robustly relies on the accuracy and the quality of error-prone linguistic procedures, such as the identification of the part-of-speech and the syntactic tree. Additional linguisti...
متن کاملAdvanced unsupervised joint prosody labeling and modeling for Mandarin speech and its application to prosody generation for TTS
Motivated by the success of the unsupervised joint prosody labeling and modeling (UJPLM) method for Mandarin speech on modeling of syllable pitch contour in our previous study, in this paper, the advanced UJPLM (A-UJPLM) method is proposed based on UJPLM to jointly label prosodic tags and model syllable pitch contour, duration and energy level. Experimental results on the Sinica Treebank corpus...
متن کاملA Framework for Language-Independent Analysis and Prosodic Feature Annotation of Text Corpora
Concept-to-Speech systems include Natural Language Generators that produce linguistically enriched text descriptions which can lead to significantly improved quality of speech synthesis. There are cases, however, where either the generator modules produce pieces of non-analyzed, non-annotated plain text, or such modules are not available at all. Moreover, the language analysis is restricted by ...
متن کاملLinguistically Annotated Learner Corpora: Aspects of a Layered Linguistic Encoding and Standardized Representation
Linguistically annotated corpora that are stored in standardized digital form can be a valuable source of empirical insight. They can help verify linguistic generalizations and support the formulation of new hypotheses. The linguistic annotation of such corpora often is crucial for their effective exploration from a linguistic perspective. The annotation essentially serves as an index to the li...
متن کاملProsody Modeling in Concept-to-Speech Generation: Methodological Issues
We explore three issues for the development of Concept-to-Speech (CTS) systems. We identify information available in a language generation system that has the potential to impact prosody; investigate the role played by different corpora in CTS prosody modeling; and explore different methodologies for learning how linguistic features impact prosody. Our major focus is on the comparison of two ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEICE Transactions
دوره 88-D شماره
صفحات -
تاریخ انتشار 2005